Measuring NUMA effects with the STREAM benchmark

نویسنده

  • Lars Bergstrom
چکیده

Modern high-end machines feature multiple processor packages, each of which contains multiple independent cores and integrated memory controllers connected directly to dedicated physical RAM. These packages are connected via a shared bus, creating a system with a heterogeneous memory hierarchy. Since this shared bus has less bandwidth than the sum of the links to memory, aggregate memory bandwidth is higher when parallel threads all access memory local to their processor package than when they access memory attached to a remote package. But, the impact of this heterogeneous memory architecture is not easily understood from vendor benchmarks. Even where these measurements are available, they provide only best-case memory throughput. This work presents a series of modifications to the well-known STREAM benchmark to measure the effects of NUMA on both a 48-core AMD Opteron machine and a 32-core Intel Xeon machine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MAP-numa: Access Patterns Used to Characterize the NUMA Memory Access Optimization Techniques and Algorithms

Some typical memory access patterns are provided and programmed in C, which can be used as benchmark to characterize the various techniques and algorithms aim to improve the performance of NUMA memory access. These access patterns, called MAP-numa (Memory Access Patterns for NUMA), currently include three classes, whose working data sets are corresponding to 1-dimension array, 2-dimension matri...

متن کامل

Lock cohorting: A general technique for designing NUMA locks Citation

Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machines’ non-uniform memory and caching hierarchy, ever more important. This paper presents lock cohorting, a general new technique for designing NUMA-aware locks that is as simple as it is powerful. Lock cohorting allows one to transform any...

متن کامل

Task Parallel Models Based on Dynamic Data Placement to Reduce NUMA Effects

NUMA (Non-Uniform Memory Access) multicore computers become popular in scientific and industrial fields due to its scalable memory performance. However, large-scale intensive data computing on NUMA architecture are facing up to the challenges in data locality problems called NUMA effects that are caused by the overhead accesses of cross-node data. Our task parallel model bases on the strategy o...

متن کامل

Memory Latency Reduction with Fine-grain Migrating Threads in Numa Shared-memory Multiprocessors

In order to fully realize the potential performance benefits of large-scale NUMA shared memory multiprocessors, efficient techniques to reduce/tolerate long memory access latencies in such systems are to be developed. This paper discusses the concept, software and hardware support for memory latency reduction through fine-grain non-transparent thread migration, referred to as mobile multithread...

متن کامل

Experimental Evaluation of NUMA Effects on Database Management Systems

NUMA systems with multiple CPUs and large main memories are common today. Consequently, database management systems (DBMSs) in data centers are deployed on NUMA systems. They serve a wide range of database use-cases, single large applications having high performance needs as well as many small applications that are consolidated on one machine to save resources and increase utilization. Database...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1103.3225  شماره 

صفحات  -

تاریخ انتشار 2011